14. Quiz: Epsilon-Greedy Policies
Quiz: Epsilon-Greedy Policies
In the previous concept, you learned about \epsilon-greedy policies.
You can think of the agent who follows an \epsilon-greedy policy as always having a (potentially unfair) coin at its disposal, with probability \epsilon of landing heads. After observing a state, the agent flips the coin.
- If the coin lands tails (so, with probability 1-\epsilon), the agent selects the greedy action.
- If the coin lands heads (so, with probability \epsilon), the agent selects an action uniformly at random from the set of available (non-greedy AND greedy) actions.
In order to construct a policy \pi that is \epsilon-greedy with respect to the current action-value function estimate Q, we need only set

for each s\in\mathcal{S} and a\in\mathcal{A}(s). Note that \epsilon must always be a value between 0 and 1, inclusive (that is, \epsilon \in [0,1]).
In this quiz, you will answer a few questions to test your intuition.
SOLUTION:
- (1) epsilon = 0
SOLUTION:
- (5) This is a trick question! The *true answer* is that none of the values for epsilon satisfy this requirement.
SOLUTION:
- (4) epsilon = 1
SOLUTION:
- (2) epsilon = 0.3
- (3) epsilon = 0.5
- (4) epsilon = 1